Professor or Screaming Beast? Detecting Anomalous Words in Chinese

نویسندگان

  • Wei Liu
  • Ben Allison
  • Louise Guthrie
چکیده

The Internet has become a very popular platform for communication around the world. However because most modern computer keyboards are Latin-based, Asian language speakers (such as Chinese) cannot input characters (Hanzi) directly with these keyboards. As a result, methods for representing Chinese characters using Latin alphabets were introduced. The most popular method among these is the Pinyin input system. Pinyin is also called ”Romanised” Chinese in that it phonetically resembles a Chinese character. Due to the highly ambiguous mapping from Pinyin to Chinese characters, word misuses can occur using standard computer keyboard, and more commonly so in internet chat-rooms or instant messengers where the language used is less formal. In this paper we aim to develop a system that can automatically identify such anomalies, whether they are simple typos intentional substitutions. After identifying them, the system should suggest the correct word to be used.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Implementation of a Software System for Detecting Orthographical or Morphological Errors in Persian Words

This paper presents a new method for analyzing words in the Persian language context to find orthographical and structural errors regardless of the meaning. This technique tokenizes each word in a statement then tries to detect the kind of word, and analyses its correctness in terms of orthography and morphology by means of a lexicon. It should be noted that some words in the Persian language h...

متن کامل

First Language Activation during Second Language Lexical Processing in a Sentential Context

 Lexicalization-patterns, the way words are mapped onto concepts, differ from one language      to another. This study investigated the influence of first language (L1) lexicalization patterns on the processing of second language (L2) words in sentential contexts by both less proficient and more proficient Persian learners of English. The focus was on cases where two different senses of a polys...

متن کامل

The Effects of Phonological Neighborhoods on Spoken Word Recognition in Mandarin Chinese

Title of Document: THE EFFECTS OF PHONOLOGICAL NEIGHBORHOODS ON SPOKEN WORD RECOGNITION IN MANDARIN CHINESE Pei-Tzu Tsai, Master of Arts, 2007 Directed By: Professor Nan Bernstein Ratner Department of Hearing and Speech Sciences Associate Professor Rochelle Newman Department of Hearing and Speech Sciences Spoken word recognition is influenced by words similar to the target word with one phoneme...

متن کامل

Separation Between Anomalous Targets and Background Based on the Decomposition of Reduced Dimension Hyperspectral Image

The application of anomaly detection has been given a special place among the different   processings of hyperspectral images. Nowadays, many of the methods only use background information to detect between anomaly pixels and background. Due to noise and the presence of anomaly pixels in the background, the assumption of the specific statistical distribution of the background, as well as the co...

متن کامل

Anomaly Detecting Within Dynamic Chinese Chat Text

The problem in processing Chinese chat text originates from the anomalous characteristics and dynamic nature of such a text genre. That is, it uses ill-edited terms and anomalous writing styles in chat text, and the anomaly is created and discarded very quickly. To handle this problem, one solution is to re-train the recognizer periodically. This costs a lot of manpower in producing the timely ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008